Learning Context Free Grammars in the Limit Aided by the Sample Distribution

نویسنده

  • Yoav Seginer
چکیده

We present an algorithm for learning context free grammars from positive structural examples (unlabeled parse trees). The algorithm receives a parameter in the form of a finite set of structures and the class of languages learnable by the algorithm depends on this parameter. Every context free language belongs to many such learnable classes. A second part of the algorithm is then used to determine this parameter (based on the language sample). By Gold’s theorem, without introducing additional assumptions, there is no way to ensure that, for every language, the parameter chosen by the learner will make the language learnable. However, we show that determining the parameter based on the sample distribution is often reasonable, given some weak assumptions on this distribution. Among other things, repeated learning, where one learner learns the language the previous learner converged to, is guaranteed to produce a learnable language after a finite number of steps. This set of limit languages then forms a natural class of learnable languages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning context-free grammars from stochastic structural information

We consider the problem of learning context-free grammars from stochastic structural data. For this purpose, we have developed an algorithm (tlips) which identiies any rational tree set from stochastic samples and approximates the probability distribution of the trees in the language. The procedure identiies equivalent subtrees in the sample and outputs the hypothesis in linear time with the nu...

متن کامل

Leveraging Lexical Semantics to Infer Context-Free Grammars

Context-free grammars cannot be identified in the limit from positive examples (Gold, 1967), yet natural language grammars are more powerful than context-free grammars and humans learn them with remarkable ease from positive examples (Marcus, 1993). Identifiability results for formal languages ignore a potentially powerful source of information available to learners of natural languages, namely...

متن کامل

Efficient Learning of Context-Free Grammars from Positive Structural Examples

In this paper, we introduce a new normal form for context-free grammars, called reversible context-free grammars, for the problem of learning context-free grammars from positive-only examples. A context-free grammar G = (N, Z, P, S) is said to be reversible if (1) A + G( and B -+ a in P implies A = B and (2) A -+ a@ and A --f aCfl in P implies B = C. We show that the class of reversible context...

متن کامل

Learning Stochastic Context-Free Grammars from Corpora Using a Genetic Algorithm

A genetic algorithm for inferring stochastic context-free grammars from nite language samples is described. Solutions to the inference problem are evolved by optimizing the parameters of a covering grammar for a given language sample. We describe a number of experiments in learning grammars for a range of formal languages. The results of these experiments are encouraging and compare very favour...

متن کامل

Learning Context Free Grammars with the Syntactic Concept Lattice

The Syntactic Concept Lattice is a residuated lattice based on the distributional structure of a language; the natural representation based on this is a context sensitive formalism. Here we examine the possibility of basing a context free grammar (cfg) on the structure of this lattice; in particular by choosing non-terminals to correspond to concepts in this lattice. We present a learning algor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003